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Abstract 

The Qiangic languages in western Sichuan (WSC) are believed to be the oldest branch of the Sino-Tibetan linguistic family, 
and therefore, all Sino-Tibetan populations might have originated in WSC. However, very few genetic investigations have 
been done on Qiangic populations and no genetic evidences for the origin of Sino-Tibetan populations have been 
provided. By using the informative Y chromosome and mitochondrial DNA (mtDNA) markers, we analyzed the genetic 
structure of Qiangic populations. Our results revealed a predominantly Northern Asian-specific component in Qiangic 
populations, especially in maternal lineages. The Qiangic populations are an admixture of the northward migrations of East 
Asian initial settlers with Y chromosome haplogroup D (D1 -Ml 5 and the later originated D3a-P47) in the late Paleolithic age, 
and the southward Di-Qiang people with dominant haplogroup 03a2c1*-M134 and 03a2c1a-M1 17 in the Neolithic Age. 
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Introduction 

The Sino-Tibetan languages are a family of some 460 
languages, including two subfamilies, namely Chinese and 
Tibeto-Burman. They are spoken by over a billion people all 
over East Asia and Southeast Asia, and second only to the Indo- 
European languages in terms of the population size of native 
speakers [1]. The linguistic connection between Chinese and 
Tibeto-Burman are well established. There are over 300 cognates 
between Old Chinese and Proto-Tibeto-Burman, grouping them 
into the same language family [1]. Based on lexical evidence and 
cladistic methods, Wang estimated that Chinese split away from 
Tibeto-Burman around 6 thousand years ago (kya) [2]. The 
Qiangic languages in western China were believed to be the oldest 
type of Sino-Tibetan languages, and have given birth to all other 
Sino-Tibetan languages [1]. Archaeological evidence [1,3] also 
indicated that the ancestors of Sino-Tibetan populations lived 
around at least 6 kya in western China [1,3]. 

Despite intense linguistic and archaeological researches, little 
has been known about how the Sino-Tibetan people dispersed 
from western China? During the past two decades, the charac- 
terization of genetic diversity has shed light on the history of Sino- 
Tibetan populations, especially the diversity denned by the 
maternal mtDNA and the paternal Y chromosome. In the 
maternal side, mtDNA evidence suggested a northern Asian 
origin of Tibetans, due to the high frequencies of northern Asian 
specific haplogroup A, D, G, and M8 [4-8]. However, that 
evidence has been contradicted by another work [9], which 



showed that the southern Tibeto-Burman populations exhibited 
sex-biased admixture with a stronger influence of northern 
immigrants on the male lineages and a more extensive contribu- 
tion of southern natives to the female lineages. Likewise, the 
southern natives made a greater contribution to the maternal gene 
pool of southern Han Chinese [10]. 

Given that a correlation is emerging that suggests language 
change in an already-populated region may require a minimum 
proportion of immigrant males, while mtDNA types represent 
more ancient settlement [1 1], the Y chromosome characterization 
in the Sino-Tibetan populations may provide valuable insights into 
its origins. From the Y chromosome perspective, Su et al. found 
that almost all the modern Sino-Tibetan populations shared a 
common genetic signature, the high frequencies of 03-M122 
lineages, including 03*-M122, 03a2cl*-M134, and 03a2b-M7. 
They postulated that the ancient Di-Qiang people (Proto-Sino- 
Tibetan speakers) with the dominant 03-lineages in the upper- 
middle Yellow River basin were the ancestors of present Sino- 
Tibetan populations [12]. However, they did not give a convincing 
explanation about the high frequency of Y chromosomal Alu 
insertion (YAP) in Tibetan populations. The YAP polymorphism 
was also enriched in Japan and Andaman islands, but basically 
absent in almost all the other East Asian populations [13]. 
Haplogroup D-M 1 74 is one subhaplogroups of YAP+. Shi et al. 
proposed that D-M 174 had a southern origin and then started its 
northward expansion about 60 kya. The current fragmented 
distribution of D-M 174 was likely due to the later Neolithic 
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expansion of Han culture carrying 03-lineages [14]. In addition, 
one of 03-M122 lineages in the study of Su et al. [12], haplogroup 
03a2b-M7, was found out to be the characteristic lineage of Mon- 
Khmer and Hmong-Mien [15]. Haplogroup O3alc-00261 1, 
which was included in the 03*-M122 haplogroup in the study 
of Su et al. [12], comprises almost 17% of Han Chinese [16]. 
However, haplogroup 03alc was found at very low frequencies in 
Tibeto-Burman populations [17], suggesting that this lineage 
might not have participated in the establishment of the Tibeto- 
Burman populations. Recently, we have found that Qiang people 
have the highest Y chromosomal short tandem repeats (STRs) 
diversity among the Sino-Tibetan populations in the eastern 
Himalayas, indicating the Qiangic group to be the origin of the 
Sino-Tibetan expansion [18]. However, the highest genetic 
diversity of Qiang people might also be the result of repeated 
migrations from all directions. 

Y chromosome evidence indicates that Qiang people might be 
the origin source for the Sino-Tibetan populations [12,18]. Qiang 
people refer to the populations speaking Qiangic languages, a 
group of the northeastern Tibeto-Burman branch, spoken mainly 
in Southwestern China (Figure 1), especially in western Sichuan 
(WSC). Qiangic has more than ten sub-branches, such as Horpa, 
Lavrung, Ersu and Zhaba [19]. The differentiation of the various 
Qiangic languages makes WSC a very important place for 
studying the origin of Sino-Tibetan. Furthermore, WSC is located 
between the upper-middle Yellow River basin and the eastern 
Himalayas, probably serving as a conduit for gene flow during the 
origin of the Sino-Tibetan populations. Here, we integrate Y 
chromosome and mtDNA diversity in Qiangic populations located 
in the WSC corridor to provide a broader framework for 
reconstructing the history of Sino-Tibetan. 



Materials and Methods 

Population samples 

We collected blood samples of 407 healthy and unrelated 
individuals from four Qiaingic populations in western Sichuan 
province (Figure 1). Our study was approved by the Ethnic 
Committee of School of Life Sciences, Fudan University. All 
individuals were adequately informed and signed their informed 
content before their participation. The populations were labeled as 
follows: Horpa-Danba (DB), 47 Horpa individuals from Danba 
County of Sichuan; Horpa-Daofu (DF), 43 Horpa individuals from 
Bamei Town, Daofu County of Sichuan; Tibetan-Xinlong (XL), 
1 24 Khams Tibetans from Xinlong County of Sichuan; Tibetan- 
Yajiang (YJ), 193 Khams Tibetans from Hekou Town, Yajiang 
County of Sichuan. Genomic DNA was extracted using DP-318 
Kit (Tiangen Biotechnology, Beijing). 

Y chromosome markers 

The samples were typed through seven panels of 100 SNPs as 
listed in the latest Y chromosome phylogenetic tree [16,20]. 

Haplogroup O panel: M175, M119, P203, MHO, M268, P31, 
M95, M176, M122, M324, M121, P201, M7, M134, Ml 17, 
002611, P164, L127 (rsl7269396), and KL1 (rsl7276338). 

Corset Panel: M130, P256, Ml, M231, M168, M174, M45, 
M89, M272, M258, M242, M207, M9, M96, P125, M304, M201 
and M306. 

Haplogroup C panel: P54, M105, M48, M208, M407, P33, 
M93, P39, P92, P53.1, M217, M38, M210, M356, P55, and 
M347. 

Haplogroup D panel: P47, Nl, P99, M15, M125, M55, M64.2, 
Ml 16.1, M151, N2, and 022457. 




Figure 1. Geographic locations of Qiangic and other referenced East Asian populations in this study, (a). Geographic location of WSC 
and distributions of the East Asian populations used in data analysis; (b). Detailed geographic location of studied Qiangic speaking populations. The 
number of individual sampled in each population is enclosed in parentheses. 
doi:1 0.1 371 /journal.pone.01 03772.g001 
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Haplogroup N panel: M214, LLY22g, M128, M46/Tat, P63, 
PI 19, P105, P43, and M178. 

Haplogroup R panel: M306, Ml 73, Ml 24, M420, 
SRY10831.2, M17, M64.1, M198, M343, V88, M458, M73, 
M434, P312, M269, and U106/M405. 

Haplogroup Q, panel: P36.2, M3, M120, MEH2, M378, N14/ 
M265, M25, M143, M346, L53, and M323. 

Those binary markers were hierarchically genotyped by 
SNaPshot (ABI SNaPshot Multiplex Kit) and fluorescent allele- 
specific PCR. PCR products were electrophoresed on a 3730x1 
Genetic Analyzer (Applied Biosystems, Carlsbad, CA). 

Seventeen Y chromosomal STRs (DYS19, DYS389I, 
DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a, 
DYS385b, DYS438, DYS439, DYS437, DYS448, DYS456, 
DYS458, DYS635 and YGATAH4) were amplified using the 
AmpFlSTR Yfiler PCR Amplification kit (Applied Biosystems, 
Carlsbad, CA, USA). Amplified products were separated and 
detected using the ABI 3730x1 Genetic Analyzer (Applied 
Biosystems, Carlsbad, CA, USA) according to the manufacturer's 
recommended protocol. The data were analyzed using Gene- 
Mapper ID v3.2 (Applied Biosystems, Carlsbad, CA, USA). For 
use in the analyses, DYS389II was calculated by subtracting the 
DYS389I allele size. 

Mitochondrial DNA markers 

The hypervariable segment I (HVS-I) of the control region was 
amplified by primers L15974 and H16488 [7]. Purified PCR 
products were sequenced using the BigDye terminator cycle 
sequencing kit and an ABI 3730XL genetic analyzer (Applied 
Biosystems, Carlsbad, CA, USA). A SNaPshot assay was used for 
typing SNPs in the coding regions to confirm haplogroup identity. 
This assay was designed as a multiplex panel including 2 1 coding 
region SNPs and one length variation marker [5] . Both the HVS-I 
motif and the coding region variations were used to infer 
haplogroups. In addition, three representative mtDNA (BM024, 
DBB005, and DBB006) have been completely sequenced using the 
method as described in our previous work [5] . The nomenclature 
of mtDNA follows van Oven and Kayser [21], with several latest 
new modifications (http:/ /www.phylotree.org/). The mtDNA 
sequences have been deposited in Genbank with accession 
numbers KJ783504-KJ783899. 

Statistical analyses 

Principal component analysis (PCA) was performed using SPSS 
18.0 software (SPSS, Chicago, IL, USA). Networks of Y 
chromosomal STR data and the mtDNA HVS-I motifs were 
constructed by reduced median-joining method [22] using 
NETWORK v. 4.5.1.6 (Fluxus-engineering.com). Molecular 
diversity, population structure estimates and Y-STR genetic 
distances between populations were calculated using Arlequin v. 
3.1 1 [23]. Classical frequency spectrum tests, such as Tajima's D, 
Fu and Li's D, D*, F and F*, were calculated using DnaSP5.0 to 
detect deviation from neutrality [24-27]. Coalescence times of 
mtDNA haplogroups of interest were calculated by p statistic 
method [28-29] using recently corrected calibrated mutation rate: 
18,845 years per mutation in HVS-I (16090-16365) [30]. 
Reference population data on the Y chromosomes [14,18,31- 
44] and mtDNA [8-10,45-69] were retrieved from the literature. 
Time estimations for main Y chromosomal lineages were made 
using 15 STRs (excluding DYS385a and DYS385b) in BATW- 
ING [70] under a model of exponential growth from an initially 
constant-sized population. The parameters used in estimation 
were following Xue et al [44] . Four sets of Y-STR mutation rates 
were applied in time estimations as Wei et al did [7 1] . These are a 



widely used evolutionary mutation rate (EMR) [72], two observed 
genealogical mutation rates (OMRB and OMRS) [73,74], and a 
genealogical mutation rate adjusted for population variation using 
logistic model (ImMR) [73]. A total of 10 4 samples of the 
program's output representing 10 MCMC cycles were taken after 
discarding the first 3xl0 5 samples as burn-in. The Time to the 
Most Recent Common Ancestor (TMRCA) is calculated using the 
product of the estimated population size N and the height of the 
tree T (in coalescent units) [70]. A generation time of 25 years was 
used to produce a time estimate in years. The geographic 
distributions of Y chromosome haplogroup Dl and D3a are 
presented by generation of contour maps using Surfer 8.0 
Software (Golden Software). 

Results 

Y chromosome 

Y chromosome haplogroup profile. According to the 
nomenclature of Y Chromosome Consortium (YCC) [16,20], 23 
SNP haplogroups were determined from the 127 male individual 
samples (Figure 2a, Table SI, and Table S2). Haplogroup Dl- 
M15 and its subhaplogroups, which are widely distributed across 
East Asia including most of the Tibeto-Burman, Tai-Kadai and 
Hmong-Mien speaking populations [4,14,75] (Figure SI in Doc 
SI), are also prevalent in the four studied populations (44.44% and 
12.50% in Horpa-Danba and Horpa-Daofu, respectively; 8.70% 
in Tibetan-Xinlong and 6.38% in Tibetan- Yajiang). Haplogroup 
D3a-P47 is almost exclusively distributed in Tibeto-Burman 
populations [4,14,75] (Figure SI in Doc SI) and also found highly 
frequent in Horpa-Daofu, Tibetan-Xinlong and Tibetan- Yajiang, 
but absent in Horpa-Danba. Haplogroup Olal-P203, which 
occurs at high frequencies in Tai-Kadai speaking people along the 
southeast coast of China and Taiwan aborigines [16,75], is also 
observed at a high frequency in Yajiang (21.28%) and moderate 
frequencies in Daofu and Xinlong (6.25% and 8.70%, respective- 
ly), but absent in Danba. The major lineages in the Indo-China 
Peninsula, 02al-M95 and its subhaplogroups, are also found at 
moderate or relatively low levels in the four studied populations. 
Haplogroup 03-M122 is the most common haplogroup in China 
and prevalent throughout East and Southeast Asia, comprising 
roughly 25-37% of the studied Qjangic populations. 03alc- 
002611, 03a2cl-M134, and 03a2cla-M117 are three main 
subclades of 03, each accounting for 12-17% of the Han Chinese 
[16,75]. However, their frequencies vary a lot in Qjangic 
populations. O3alc-002611 comprises 15.22% of Xinlong Tibet- 
ans, but absent in three other populations. 03a2cl*-M134 
accounts for about 6% of the Horpa-Danba and Tibetans of 
Xinlong and Yajiang, but absent in Horpa-Daofu. Haplogroup 
03a2cla-Ml 17, which exhibits high frequencies in other Tibeto- 
Burman populations, is also observed at high frequencies in 
Horpa-Danba and Tibetan- Yajiang (22.22% and 19.15%, respec- 
tively), and moderate frequencies in Horpa-Daofu and Tibetan- 
Xinlong (12.50% and 10.87%, respectively). Haplogroup C-M130 
has a very wide distribution and might represent one of the earliest 
settlements in East Asia. Haplogroup C* (Ml 30+, Ml 05-, M38- 
, M217-, M347-, and M356-) has been found at low 
frequencies along the southern coast of mainland East Asia as 
well as throughout the islands of Southeast Asia [75,76]. In spite of 
the wide distribution of C*, they all have similar STR haplotypes 
(DYS19, 15; DYS389I, 12; DYS389b, 16; DYS390, 21; DYS391, 
10; DYS392, 1 1; DYS393). There are two C* individuals detected 
in this study, one in Horpa-Danba and the other in Tibetan- 
Xinlong. Those two individuals also have the same STR haplotype 
as mentioned above. Haplogroup C3-M217 is the most wide- 
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spread subclade of C-M130, and reaches the highest frequencies 
among the populations of Northern East Aisa, especially in 
Mongolians [75-77]. Haplogroup C3-M217 has also been found 
in Tibetan- Yajiang at a frequency of 10.64%, but totally absent in 
other three populations. Haplogroup N-M231 has both a unique 
and widespread distribution throughout northern Eurasia and 
reaches highest frequency among most of the Uralic populations as 
well as some Altaic populations. Haplogroup Nlcla-M178 is the 
most common subclade of N-M231 and thought to be originated 
in China [75,78]. Nlcla-M178 has also been detected in Horpa- 
Daofu and Tibetan-Xinlong at 12.50% and 2.17%, respectively. 
The 17-STR haplotype of Nlcla individuals in Horpa-Daofu is 
exactly the same with some Komi people in Russia [79,80]. 
However, the haplotype of Nlcla individual in Xinlong shows 
more similarity with samples of its surrounding populations 
(unpublished data). It is particularly noteworthy that Central- 
South Asia related haplogroups J-M304 and R2-M124 [81] have 
also been detected at low frequencies in Qiangic populations. 

PCA and STR genetic distance analysis. The paternal 
genetic relationships among Qiangic, Tibeto-Burman, and other 
East Asian populations were discerned with the aid of additional 
published Y chromosome datasets. We used a PCA based on the 
distribution of Y chromosome haplogroup frequencies of 51 
populations to show the overall clustering pattern (Figure 3a, 
Table S3). Results of PCA are presented by the plots of the first 
two principal components (PCs), which together account for 
31.31% of the Y chromosome variation in these populations. The 
first PC revealed a clear north-south geographic division between 
Altaic and Sino-Tibetan, Tai-Kadai & Hmong-Mien. Haplogroup 
C3-M217, G-M201, J-P209, and R-M207 were found to 
contribute most to the northern pole of Altaic. Haplogroup O- 
M175 contributed most to the southern pole. Sino-Tibetan, Tai- 
Kadai and Hmong-Mien populations showed different distribu- 
tions of the second PC. Horpa-Danba, Horpa-Daofu, Tibetan- 
Xinlong, and Tibetan- Yajiang were clustered within Sino-Tibetan 
group, which reflected a clear linguistic clustering pattern. 



Haplogroups O3alc-00261 1, 03a2cl*-M134, and 03a2cla- 
M117 contributed most to the Sino-Tibetan pole. Contrastingly, 
haplogroups 03a2b*-M7 and 02al-M95 were concentrated at 
the Tai-Kadai and Hmong-Mien pole. The four western Sichuan 
populations clustered tighdy together with other Tibeto-Burman 
populations, such as Qiang, Tibetan- Yunnan, Yi, and Tujia, 
mostly due to high frequencies of haplogroup D3a-P47, 03a2cla- 
Ml 17, D1-M15, and 03a2cl*-M134. In the STR genetic distance 
based neighbor-joining tree, Horpa-Daofu, Tibetan- Yajiang, and 
Tibetan-Xinlong also clustered tightly with Tibeto-Burman 
populations. However, Horpa-Danba was close related to Han 
and Hmong-Mien populations (Figure S2 in Doc SI). As PCA was 
performed from frequencies of haplogroups and genetic distance 
was obtained from only 6 STR markers (Table S4), the results are 
suggestive but not conclusive. 

Network analysis and time estimation. To discern the 
detail relationship between the D3a-P47, 03a2cla-Ml 17, Dl- 
M15, and 03a2cl*-M134 haplogroups in Tibeto-Burman and 
other related populations, a median-joining network was con- 
structed based on Y-STR haplotypes of those haplogroups 
(Figure 4). A clear Sino-Tibetan vs. Tai-Kadai and Hmong-Mien 
divergence can be inferred from the network of D 1 -M 1 5 though 
sporadic haplotype sharing exists. Furthermore, within the Sino- 
Tibetan populations, haplogroup D1-M15 contains distinct STR 
haplotypes between Qiangic populations, Northern Han, and 
Tibetan-Tibet, implying that D1-M15 experienced a serial of 
founder effects or strong botdenecks and a secondary expansion in 
Sino-Tibetan populations. In the network of D3a-P47, the 
divergence between Qiang and Tibetan with other Tibeto- 
Burman populations has been observed. Other Tibeto-Burman 
populations only have a subset of the Qiang and Tibetan 
haplotypes. The star-like network of D3a-P47 also suggests 
population expansion in Tibetans. The network of 03a2cl*- 
M134 shows a clear divergence between Tibetan and northern 
populations (Northern Han and Altaic). Southern Han and Tai- 
Kadai samples constitute the center of the network and act as a 
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Figure 2. Y chromosome and mtDIMA haplogroup frequencies of studied Qiangic populations, (a). Y chromosome haplogroup 
frequencies of the four Qiangic populations; (b). mtDNA haplogroup frequencies of the four Qiangic populations. 
doi:1 0.1 371 /journal.pone.01 03772.g002 
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Figure 3. Phylogenetic relationship between Qiangic and reference populations analyzed by PCA with the frequencies of 
haplogroups. (a). PCA plot based on Y chromosome haplogroup frequencies of 51 populations; (b). PCA plot based on mtDNA haplogroups 
frequencies of 72 populations. 
doi:10.1371/journal.pone.0103772.g003 



bridge connected Tibetan and northern populations, which 
supports the southern origin and northern expansion of 
03a2cl*-M134. Most of the Qiangic samples belonging to 
haplogroup 03a2cl*-M134 share haplotypes with northern 
populations, indicating a recent gene flow from northern 
populations to Qiangic populations. A population expansion has 
also been observed in the star-like network of haplogroup 
03a2cla-M117. o However, the haplotypes of 03a2cla-Ml 17 
are extensively shared among all the East Asia populations. 

We then estimated the coalescence and expansion time of Y 
chromosome lineages in Qiangic populations (Table 1). The ages 
estimated using evolutionary rate are about two or three times 
higher than using genealogical rates. As the times using 
genealogical rates fit well with sequence-based estimates in Y 
chromosome lineage dating [82], we present results from the 
genealogical calculations in the following section. Haplogroup D 
can trace back to late Palaeolithic period, while other sub- 
haplogroups coalescence more likely in Neolithic Time. The 
lineage expansion times all fall into Neolithic Time ranging from 
4.2 to 7.5 kya. 

MtDNA 

MtDNA haplogroup profiles, Population summary 

statistics, and PCA analysis. MtDNA HVS-I sequences of 
396 individuals from the four studied Qiangic populations have 
been successfully typed. A total of 214 different haplotypes were 
defined by 134 polymorphic sites in the HVS-I dataset. The 
haplotype diversity of those Qiangic groups ranged from 0.978 to 
0.994, with the lowest haplotype diversity observed in Horpa- 
Daofu (0.978) and the highest in Horpa-Danba (0.994). The mean 



number of pairwise differences (MNPD) and nucleotide diversity 
(ND) show a similar pattern with the haplotype diversity, as the 
highest diversity was observed in Horpa-Danba and the lowest in 
Horpa-Daofu. However, Tibetan- Yajiang has a higher diversity in 
haplotype but lower diversity in MNPD and ND than Tibetan- 
Xinlong. Measures of population growth (Tajima's D, Fu's Fs, Fu 
and Li's D*, and Fu and Li's F*) all gave the negative values for 
each population, but Tajima's D, Fu & Li's D* and F* were not 
statistically significant in Horpa-Daofu (Table 2). The not 
significant growth factor values and the lowest diversities of 
Horpa-Daofu might be the result of small sample sizes and/or 
genetic drift. 

397 samples were successfully assigned to mtDNA haplogroups 
using a combination of HVS-I sequence motifs and single 
nucleotide polymorphisms (SNPs) distributed around the coding 
region of the mtDNA genome. A total of 79 haplogroups or 
paragroups (unclassified lineages within a clade marked with an 
asterisk [*]) were identified (Figure 2b, Table SI and Table S2), all 
within the two principal out-Africa macrohaplogroups: M and N 
(including R). Macrohaplogroup M and its subhaplogroups 
comprise 59.70% of the Qiangic maternal gene pool, and 
macrohaplogroup N and its subhaplogroups comprise the left 
49.30%. The most prevalent haplogroups within macroha- 
plogroup M, haplogroup D and G represent 18.14% and 
13.60% of all the samples. Within macrohaplogroup N, hap- 
logroup A and F are the most common lineages, accounting for 
13.60% and 10.58% of Qiangic, respectively. The majority of the 
mtDNA lineages belong to eastern Eurasian specific groups, 
including those from Northeast Asia (A, D4, D5, G, C, and Z) 
[83-85] and Southern China or Southeast Asia (B, F, M7, and R9) 
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Figure 4. Reduced Median joining network of Y chromosome haplogroups. Reduced Median-joining network based on six Y-STR data 
(DYS19, DYS389I, DYS390, DYS391, DYS392, and DYS393) of haplogroup D1-M15, D3a-P47, 03a2c1*-M134, and 03a2c1a-M1 17. 
doi:1 0.1 371 /journal.pone.01 03772.g004 



[54]. Only two U samples in Yajiang might be traced for their 
origins to western or southern Eurasia, comprising 0.5% of 
Qiangic. The frequencies of Southern China or Southeast Asia 
specific haplogroups in Horpa-Danba, Horpa-Daofu, Tibetan- 
Xinlong, and Tibetan- Yajiang are 26.09%, 22.50%, 27.73%, and 
21.35%, respectively. However, Tibetan- Yajiang, Horpa-Danba, 
Horpa-Daofu and, to a lesser extent, Tibetan-Xinlong, display a 
considerable Northeast Asian proportion of lineages (56.77%, 
56.52%, 55.00%, and 43.70%, respectively). Consistent with other 
studied Tibetan populations on the Tibetan Plateau, Qiangic 
populations also showed a strong similarity with Northeast Asian 
populations. 

We performed a PCA using the mtDNA haplogroup frequen- 
cies of Qiangic groups in this study and other 68 populations to see 
the detailed genetic patterns of those populations (Figure 3b, 
Table S3). The first PC revealed a clear geographic division 
between northern populations (Altaic and Northern Han) and 
southern populations (Southern Han, Tai-Kadai, and Hmong- 
Mien). Qiangic groups were clustered in the northern pole due to 



the high frequencies of haplogroup A and G. Han Chinese and 
Tibeto-Burman populations showed significantly different distri- 
butions in the second PC. Qiangic populations were clustered 
within Tibeto-Burman group due to the existence of haplogroup 
M9a'b and Ml 3. 

Phylogeography of Macrohaplogroup 

M. Macrohaplogroup M and its subhaplogroups represent the 
majority of the Qiangic maternal lineages, with frequencies 
ranging from 65.22% in Horpa-Danba to 57.98% in Tibetan- 
Xinlong. Haplogroup D4 and G are the most frequent sub-clades 
of macrohaplogroup M in Qiangic populations, each comprising 
13.60%. Haplogroup D4, which is prevalent throughout Central 
Asia [85], Northeast Asia [86,87], and Southwest China 
[5,8,65,66], represents the majority of haplogroup D samples in 
Horpa-Danba (17.39%), Tibetan-Yajiang (13.54%), Tibetan-Xin- 
long (13.45%), and Horpa-Daofu (10.00%). The haplotypes of 
D4* were extensively shared among Qiangic, Tibetan, Han 
Chinese, and Altaic (Figure 5). Specifically, sub-haplogroup D4j3 
was detected in Horpa-Danba and Horpa-Daofu with consider- 
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able frequencies (4.35% and 5.00%, respectively). The age 
estimates generated for D4* and D4j3 in Qjangic were about 
15 kya (Table 3). In addition, the population growth factor, Fu's 
Fs values of haplogroups D4* and D4j3, were significandy 
negative (Table 4), implying post-LGM expansions of those two 
lineages in Qiangic. 

Haplogroup G is found at high frequencies in northeastern 
Siberia but it is also common among populations of Japanese 
Archipelago and Korean Peninsula. This haplogroup also 
comprises an average of 20% of the maternal gene pool of the 
Tharus from Nepal [88] and accounts for more than 10% in the 
Tibetan populations of Nagqu, Chamdo, Lhasa, Garze, and 
Monba [5] . In this study, haplogroup G and subhaplogroups G2a, 
G2blb, G3, and G3al account for 20% of Horpa-Daofu and 
reach frequencies greater than 10% in three other Qiangic 
populations. Subhaplogroup G2a is represented as four distinct 
HVS-I motif types: 16129-16223-16278-16362 (I), frequent in 
Tibetan and Southern Han but nearly absent in Altaics; 16223- 
16227-16278-16362 (II), frequent in aU the above three 
populations and probably experienced population expansion in 
Altaics (Figure 5); 16193-16223-16278-16362 (III), exclusive in 
South Asia. All of the G2a samples in Horpa-Daofu harbor 
haplotype II but add one more mutation at site 16304. However, 
most of Tibetan-Xinlong samples belong to haplotype I (50%). 
Subhaplogroup G2blb was first reported as a novel haplogroup in 
northeast India and has low frequency distribution in Tibet and 
surrounding regions [89,90]. This haplogroup accounts for 4.69%, 
2.50%, and 0.84 of Tibetan- Yajiang, Horpa-Daofu, and Tibetan- 
Xinlong. Compared with other Tibetan samples, 72.73% of 
Qiangic G2blb samples were detected with a mutation at site 
16356, thus forming some exclusive clades in the network 
(Figure 5). Subhaplogroup G3 comprises 6.77%, 5.00%, 3.36%, 
and 2.17% of Tibetan- Yajiang, Horpa-Daofu, Tibetan-Xinlong, 
and Horpa-Danba, respectively. Two Yajiang samples are further 
defined as G3al by a mutation at site 16215. In addition, we have 
found two Horpa-Danba G2a samples bearing both G2a (16278) 
and G3 (16274) characteristic mutations and thus we could not tell 
the exact haplogroup classification of those two samples. The 
coalescence time estimates of G*, G2blb, and G3 were all around 
20 kya and the age of G2a even reached about 34 kya (Table 3). 
However, it is noteworthy that the arrival time of these 
haplogroups at the Tibetan Plateau might be somewhat more 
recent than their coalescent ages would indicate, because nearly all 
these haplogroups (except G2blb) had already differentiated 
before their arrival on the plateau (Figure 5). The exclusive clades 
in the network (Figure 5) and the significant negative Fu's Fs 
values (Table 4) of G2a and G3 suggest the probable isolation and 
secondary population expansion of the two lineages. 

Haplogroup M8 has two sublineages, haplogroup C and Z. 
Haplogroup C is a common lineage, which is widespread in East 
Asia and Siberia and is one of the founder lineages among Native 
Americans [6]. Haplogroup C comprises 8—10% of Horpa-Danba 
and Tibetan- Yajiang, but was detected at a very low frequency or 
even absent in Tibetan-Xinlong and Horpa-Daofu. Almost 60% of 
the C samples in present study harbored a specific HVS-I motif 
16093-16298-16327 and were assigned as C4d. One Horpa- 
Danba individual with HVS-I motif 16298-16327 is also classified 
as C4d through complete sequencing (Doc S2). Haplogroup C4d 
has been supposed to be Tibetan specific, frequencies ranging 
from 1.6% to 5.0% in populations of Tibet [5]. However, the 
frequency of C4d in Tibetan- Yajiang even reaches 6.25%. In 
addition, all the reported C4d samples in Tibet and Qinghai have 
the same motif as above mentioned. However, 25% of the C4d 
samples in Yajiang share another mutation at site 16111. About 
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23% of C samples in Qiangic with a mutation at site 16357 might 
be assigned as C4a2'3'4, which is also restricted to Tibeto- 
Burman populations. Haplogroup Z is observed at relatively low 
frequencies in Qiangic populations. 

M9a'b is widely distributed in mainland East Asia [89] and 
Japan, and reaches its greatest frequency and diversity in Tibet 
[5,8] and its surrounding regions, including Nepal [88] and 
northeast India [90,91]. It has been proposed recently that 
haplogroup M9'b had most likely originated in southern China 
and/ or mainland Southeast Asia. After the LGM, M9a'b might be 
involved in some northward migrations in mainland East Asia 
[60]. In the present study, the frequencies of M9a'b in Horpa- 
Danba, Horpa-Daofu, Tibetan-Xinlong, and Tibetan- Yajiang are 
4.35%, 10%, 13.45%, and 6.77%, respectively. Most M9a* 
samples (62.5%) of Qiangic shared the main haplotype that 
clustered in the central largest clade with other Tibeto-Burman 
populations in the network. However, the estimated age of M9a* is 
relatively young at about 7 kya. M9b is largely restricted to the 
non-Tibetans in southern China and southwest China [60]. We 
have detected low frequencies of M9b in Horpa-Danba and 
Tibetan-Xinlong (2.17% and 0.84%, respectively). In the networks 
of M9ala and M9alb, most of the Qiangic samples shared the 
descent types, giving a clear signal of out of Tibet migrations of 
those haplogroups. The age estimates generated for M9ala and 
M9albl in Qiangic were around 12-13 kya (Table 3), consistent 
with proposed post-glacial dispersal of the M9a'b lineages. 

Haplogroup Ml 3a has been found at its greatest frequency and 
diversity in Tibet, but it has also been detected at very low 
frequencies in Siberian Buryat, Yakut, Altaian Kazakh, and 
Ewenki [85], and central Asian Kirghizs [92] as well as Barghuts 
[84,93,94]. The frequency of haplogroup M13a in Qiangic 
populations is remarkable, accounting for 3.27% of all samples. 
In the network of haplogroup M13al and M13a2, Qiangic and 
Tibetan-Burman samples formed some almost exclusive clades. 
This strongly suggests that these specific lineages have de novo 
origins within Tibetans. Specially, 70% of subhaplogroup M13alb 
samples in Qiangic share the same haplotype. A coalescence time 
estimate for M13alb corresponded to 5.7 kya (Table 3), suggest- 
ing a relatively recent Neolithic expansion out of Tibet and even 
more recent arrival into northern Asia of this lineage. 

Qiangic populations also exhibit some basal Eurasian mtDNA 
lineages. Haplogroup M62, for example, was first reported in 
Northeast India [90] and since then has been reported in several 
populations at low frequency throughout Tibet [5,8] . Zhao et al. 
suggested that M62 might represent the genetic relics of the initial 
Late Paleolithic settlers (>21 kya) on the Tibetan Plateau. In this 
study, we observed haplogroup M62b in three Yajiang Tibetans. 
The haplotype of those three individuals is different from all other 
reported M62 samples with a mutation at site 16305. Likewise, 
haplogroup M74a was detected in one Xinlong Tibetan, and the 
haplotype of which bearing a distinctive mutation at site 16274 
only shared with one Maonan individual, one Zhuang individual, 
and one Hainan Han Chinese [52]. Haplogroup M33c was found 
in a Tibetan sample from Yajiang with a similar haplotype as some 
Hmong-Mien samples [52]. 

Phylogeography of Macrohaplogroup N. Haplogroup R 
and its subhaplogroups (B and F) represent the majority of the 
lineages branching from the basal N trunk, accounting for 
26.09%, 22.50%, 28.57%, and 23.44% of the maternal diversity 
in Horpa-Danba, Horpa-Daofu, Tibetan-Xinlong, and Tibetan- 
Yajiang, respectively. Subhaplogroup B4* is the most frequent 
lineage of haplogroup B in Qiangic, comprising 4.53% of all the 
samples. In the network of B4*, the root clade composed almost 
exclusively of non-Tibetan-Burman samples, however, the Tibet- 
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Table 4. Growth summary statistics and frequency spectrum tests for deviation from neutrality. 





Haplogroup 


Tajima's D 


Fu's Fs 


Fu and Li's D 


Fu and Li's D* 


Fu and Li's F 


Fu and Li's F* 


A* 


-1.118 


-0.491 


-0.200 


-0.814 


-0.578 


- 1 .020 


A4 


2.184* 


6.181" 


-1.304 


3.020 b 


-1.927 


3.237 b 


B4 


-1.431 


6.724" 


2.030 c 


2.165 c 


2.1 24 c 


2.264 c 


B5b 


-0.065 


-1.793 


0.479 


0.081 


0.420 


0.054 


C* 


-0.229 


0.146 


0.679 


-0.125 


0.572 


-0.176 


D4 


2.079 b 


5.692* 


3.457 b 


3.423 b 


3.576 b 


3.519 b 


D4j3 


1.576 c 


3.082 b 


-1.619 


-1.651 


-1.899 


-1.812 


D5 


-1.035 


- 1 .262 


-1.042 


-1.193 


- 1 .083 


- 1 .264 


D5a2a 


- 1 .462 


-1.776 


-1.498 


-1.581 


-1.773 


- 1 .762 


F1 


2.324" 


3.409 b 


3.430 b 


3.637 b 


3.593 b 


3.781 b 


Fla 


-0.599 


- 1 .964 


-0.078 


-0.598 


0.000 


-0.649 


G* 


1 .809 b 


-1.637 


2.253 b 


1 .940 b 


2.478 b 


2.1 19 b 


G2a 


-0.915 


3.027 b 


-0.748 


-1.275 


-0.771 


- 1 .344 


G2b1b 


-1.270 


-1.583 


-0.995 


-1.581 


-1.161 


-1.705 


G3 


-1.121 


4.928* 


0.926 


-0.405 


0.529 


-0.707 


M7b 


-0.590 


-0.138 


-0.246 


-0.525 


-0.426 


-0.592 


M9a 


-1.541 


-0.911 


-0.888 


-1.596 


-1.194 


- 1 .766 


M9a1a 


i .680* 


-1.588 


-1.672 


2.492 b 


-1.767 


2.61 4 b 


M9a1b1 


- 1 .043 


0.627 


-0.207 


-0.971 


-0.516 


-1.081 


M10 


-0.963 


-0.943 


0.320 


-0.871 


0.190 


-0.975 


M13alb 


1.766 b 


0.505 


-0.410 


1 .942 c 


-0.555 


2.145 c 


N9a 


-0.886 


2.774 c 


-0.419 


-0.851 


-0.526 


-0.927 


R» 


-1.534 


7.523* 


2.170° 


- 1 .787 


2.371 c 


- 1 .960 



"means P<0.01, 
b means P-C0.05, 

c means 0.05<P<0.1. A haplotype L3e is used as an out group when calculating Fu and Li's D and F. 
*is a part of the parameter name or haplogroup name. 
doi:1 0.1 371 /journal.pone.01 03772.t004 



an-Burman samples only formed some small clusters or shared the 
terminal types, suggesting that B4* had already differentiated 
before its arrival in Tibet. Subhaplogroup Fl* is the most frequent 
lineage of haplogroup F in Qjangic, accounting for 5.54% of all 
the samples, and even comprising as high as 12.5% of Horpa- 
Daofu. Age estimate generated for Fl* in Qjangic was around 
5 kya (Table 3). The exclusive Qjangic cluster of Fl* in the 
network suggests a strong bottleneck or founder effect in its 
Neolithic migration towards the plateau. The significant negative 
values of the growth factor estimates (Table 4) suggest a secondary 
expansion and probable selection of Fl* lineage during its 
adaptation in the plateau. 

Haplogroup N* is almost exclusively represented by haplogroup 
A in our samples. Haplogroup A is widely distributed in northern 
and eastern Asia, occurring at frequencies of 5%— 10% in different 
populations [85] . Haplogroup A also has an average frequency of 
nearly 9% on the plateau [5]. Subhaplogroup A4*, which is 
mainly found in Central, Northeast and Southwest Asia, is the 
most frequent sublineage of haplogroup A in Qjangic, accounting 
for 2.17%, 5.00%, 4.20%, and 12.50% of Horpa-Danba, Horpa- 
Daofu, Tibetan-Xinlong, and Tibetan- Yajiang, respectively. Net- 
work analysis of haplogroup A4* revealed a star-like pattern and 
thus showed a signal of population expansion on the plateau 
(Figure 5). The probable population expansion was also confirmed 
by growth summary statistics in this lineage (Table 4). Subha- 
plogroup A 1 1 split from the root of haplogroup A very early and 



formed a distinct lineage. Al la and Al lb, the two sublineages of 
All, have the different distribution pattern. Most of the All 
samples in Tibet belong to Al 1* or Al la and only a few have a 
control-region substitution at site 16234, assigned as Allb. 
However, almost all the Al 1 samples in the Tibetan-Burman 
and Han Chinese of Yunnan belong to Al lb. In the present study, 
three of five All samples belonged to All* and the other two 
were assigned as Al lb. 

Discussion 

The Sino-Tibetan linguistic family comprises some 460 
languages distributed in East Asia, Southeast Asia, and parts of 
South Asia, including the Chinese and Tibeto-Burman subfamilies 
[1]. Despite intense linguistic, archaeological, and genetic 
researches, where the Sino-Tibetan speakers came from, how 
they dispersed remain major open questions. One widely accepted 
hypothesis states that the ancestors of the Sino-Tibetan population 
were originally from the Neolithic Age Di-Qiang people in the 
upper and middle Yellow River basin. Di people have gradually 
developed into Han Chinese and Qjangic populations since the 
collapse of Later Liang dynasty (one of the Sixteen Kingdoms 
dynasty, AD 386-403). Here, we integrated the Y chromosome 
and mtDNA evidence of Qjangic populations to provide a broader 
framework for reconstructing the history of Sino-Tibetan. 
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Figure 5. Reduced Median-joining network based on the HVRI data of mtDNA. 

doi:10.1371/journal.pone.0103772.g005 



From the paternal Y chromosome perspective, haplogroup Dl- 
M15 originated from D*-M174 during its migration into mainland 
East Asia [95] . Around 50-60 kya, a subgroup of haplogroup D*- 
M174 and D1-M15 started their northward migration through 
WSC corridor into nowadays Qinghai province, and then 



probably moved along the well-known route, called the Tibeto- 
Burman corridor, to enter the Himalayas [95]. Haplogroup D*- 
M174 probably gave birth to D3a-P47 in Tibet [95]. Haplogroup 
D3a-P47 experienced recent population expansion on the Tibetan 
Plateau, and then probably migrated southward via the WSC 
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corridor and gradually became the main genetic component of 
Tibeto-Burman populations in nowadays Sichuan, Yunnan, and 
Guangxi province. Y chromosome haplogroup D might give the 
evidences of the late Palaeolithic human activity on the plateau. 
The genetic relics of late Palaeolithic age have also been detected 
in the maternal side, for example, haplogroup M62b. In addition, 
a number of Paleolithic sites have been excavated crossing the 
Tibetan Plateau [96-99], documenting the earliest human 
presence on the plateau dated to 20-30 kya. 

Around 20^0 kya, a population with dominant haplogroup 
03-M122 Y chromosomes (haplogroup O3alc-00261 1, 03a2cl*- 
M134, 03a2cla-M117, and probably other 03 lineages) finally 
reached the upper and middle Yellow River basin and formed the 
Di-Qiang populations. During the Neolithic period, the Di-Qiang 
people experienced relatively huge population expansion. A 
subgroup of the Di-Qiang people with dominant haplogroup 
03a2cl*-M134 and 03a2cla-Ml 17, now called the Proto- 
Tibeto-Burman people left their Yellow River homeland, probably 
also moved along the Tibeto-Burman corridor, embarking on 
large-scale westward migrations to nowadays Qinghai province 
and then southward to the Himalayas, or southward migration 
directly via the WSC corridor to Yunnan and Guangxi, where 
they mixed with D-M174 linages and developed into Tibeto- 
Burman populations. However, haplogroup 03a2cl*-M134 
might have already reached Tibet predated the above southward 
migration together with 03a2cla-Ml 1 7, judging from the high 
diversity in the network of 03a2cl*-M134 (Figure 4). In addition, 
another branch of the Di-Qiang people, the proto-Chinese, with 
dominant haplogroup O3alc-002611 migrated eastward to the 
central China plain area, the middle and lower Yellow River 
Valley, and integrated gradually with the natives (probably 
populations with haplogroup C-M130 or D-M174) around 5— 
6 kya. Subsequently, the Di-Qiang people that resided in upper 
and middle Yellow River basin with haplogroup 03a2cl*-M134 
and 03a2cla-Ml 17 formed the well-known Yan-Huang tribe 
(Hot Emperor and Yellow Emperor), and the eastward branch 
with O3alc-00261 1 developed into the Dong Yi tribe. The Yan- 
Huang tribe together with the Dong Yi tribe gradually developed 
into a large population known as Han Chinese. With the 
expansion of Han Chinese, especially southward, this group 
became the largest one of the 56 officially recognized ethnic 
populations in China. 

The role of haplogroup 03-M122 lineages played in the origin 
of Tibeto-Burman populations has suggested extensive genetic 
input from northern Asians. This suggestion has been supported 
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